googleAuthRsearchConsoleRgoogleAuthRgoogleAnalyticsRgoogleComputeEngineR (Cloudyr)bigQueryR (Cloudyr)googleCloudStorageR (Cloudyr)googleLanguageR (rOpenSci)Slack group to talk around the packages #googleAuthRverse
https://www.rocker-project.org/
Maintain useful R images
rocker/r-verrocker/rstudiorocker/tidyverserocker/shinyrocker/ml-gpuFROM rocker/tidyverse:3.6.0
MAINTAINER Mark Edmondson (r@sunholo.com)
# install R package dependencies
RUN apt-get update && apt-get install -y \
libssl-dev
## Install packages from CRAN
RUN install2.r --error \
-r 'http://cran.rstudio.com' \
googleAuthR \
googleComputeEngineR \
googleAnalyticsR \
searchConsoleR \
googleCloudStorageR \
bigQueryR \
## install Github packages
&& installGithub.r MarkEdmondson1234/youtubeAnalyticsR \
## clean up
&& rm -rf /tmp/downloaded_packages/ /tmp/*.rds \
Flexible No need to ask IT to install R places, just run docker run Cross cloud, ascendent tech
Version controlled No worries new package releases will break code
Scalable Run multiple Docker containers at once, fits into event-driven, stateless serverless future
Continuous development with GitHub pushes
Pros
Probably run the same code with no changes needed
Easy to setup
Cons
Expensive
May be better to have data in database
3.75TB of RAM: $423 a day
library(googleComputeEngineR)
# this will cost a lot
bigmem <- gce_vm("big-mem",
template = "rstudio",
predefined_type = "n1-ultramem-160")library(googleComputeEngineR)
custom_image <- gce_tag_container("custom-shiny-app",
"your-project")
## make new Shiny template VM for your self-contained Shiny app
vm <- gce_vm("myapp",
template = "shiny",
predefined_type = "n1-standard-2",
dynamic_image = custom_image)googleCloudStorageR or bigQueryRPros
Fault redundency
Forces repeatable/reproducable infrastructure
library(future) makes parallel processing very useable
Cons
Changes to your code for split-map-reduce
Write meta code to handle I/O data and code
Not applicable to some problems
New in googleComputeEngineR v0.3 - shortcut that launches cluster, checks authentication for you
library(googleComputeEngineR)
vms <- gce_vm_cluster()
#2019-03-29 23:24:54> # Creating cluster with these arguments:template = r-base,dynamic_image = rocker/r-parallel,wait =
#FALSE,predefined_type = n1-standard-1
#2019-03-29 23:25:10> Operation running...
...
#2019-03-29 23:25:25> r-cluster-1 VM running
#2019-03-29 23:25:27> r-cluster-2 VM running
#2019-03-29 23:25:29> r-cluster-3 VM running
...
#2019-03-29 23:25:53> # Testing cluster:
r-cluster-1 ssh working
r-cluster-2 ssh working
r-cluster-3 ssh workinggoogleComputeEngineR has custom method for future::as.cluster
## make a future cluster
library(future)
library(googleComputeEngineR)
vms <- gce_vm_cluster()
plan(cluster, workers = as.cluster(vms))
...do parallel...# create cluster
vms <- gce_vm_cluster("r-vm", cluster_size = 3)
plan(cluster, workers = as.cluster(vms))
# get data
my_files <- list.files("myfolder")
my_data <- lapply(my_files, read.csv)
# forecast data in cluster
library(forecast)
cluster_f <- function(my_data, args = 4){
forecast(auto.arima(ts(my_data, frequency = args)))
}
result <- future_lapply(my_data, cluster_f, args = 4) Can multi-layer future loops (use each CPU within each VM)
Thanks for Grant McDermott for figuring optimal method (Issue #129)
future_sim <-
## Outer future_lapply() call loops over the no. of VMS
future_lapply(1:length(vms), FUN = function(x) {
## Inner future_lapply() call loops over desired no. of iterations / no. of VMs
future_lapply(1:(iters/length(vms)), FUN = slow_func)
})3 VMs, 8 CPUs each = 24 threads
Clusters of VMs + Docker = Horizontal scaling
Clusters of VMs + Docker + Task controller = Kubernetes
Pros
Auto-scaling, task queues etc.
Scale to billions
Potentially cheaper
May already have cluster in your organisation
Cons
Needs stateless, idempotent workflows
Message broker?
Minimum 3 VMs
Built on Cloud Build upon GitHub push:
FROM rocker/shiny
MAINTAINER Mark Edmondson (r@sunholo.com)
# install R package dependencies
RUN apt-get update && apt-get install -y \
libssl-dev
## Install packages from CRAN needed for your app
RUN install2.r --error \
-r 'http://cran.rstudio.com' \
googleAuthR \
googleAnalyticsR
## assume shiny app is in build folder /shiny
COPY ./shiny/ /srv/shiny-server/myapp/
Shiny App:
kubectl run shiny1 \
--image gcr.io/gcer-public/shiny-googleauthrdemo:latest \
--port 3838
kubectl expose deployment shiny1 \
--target-port=3838 --type=NodePort
Built on Cloud Buid every GitHub push:
FROM trestletech/plumber
# copy your plumbed R script
COPY api.R /api.R
# default is to run the plumbed script
CMD ["api.R"]
R plumber API:
kubectl run my-plumber \
--image gcr.io/your-project/my-plumber \
--port 8000
kubectl expose deployment my-plumber \
--target-port=8000 --type=NodePort
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: r-ingress-nginx
spec:
rules:
- http:
paths:
- path: /gar/
# app deployed to /gar/shiny/
backend:
serviceName: shiny1
servicePort: 3838
curl 'http://mydomain.com/api/echo?msg="its alive!"'
#> "The message is: its alive!"
A 40 mins talk at Google Next19 with lots of new things to try!
https://www.youtube.com/watch?v=XpNVixSN-Mg&feature=youtu.be
Great video that goes more into Spark clusters, Jupyter notebooks, training using ML Engine and scaling using Seldon on Kubernetes that I haven’t tried yet